{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Concise Implementation of Multilayer Perceptron"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now that we learned how multilayer perceptrons (MLPs) work in theory, let’s implement them. We begin, as always, by\n",
    "importing modules."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "import sys\n",
    "sys.path.insert(0, '..')\n",
    "import d2l\n",
    "from d2l.data import load_data_fashion_mnist\n",
    "from d2l.train import train_ch3\n",
    "\n",
    "import torch\n",
    "import torch.nn as nn\n",
    "import torch.nn.functional as F\n",
    "import torch.optim as optim"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# The Model"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The only difference from our softmax regression implementation is that we add two Dense (fully-connected) layers\n",
    "instead of one. The first is our hidden layer, which has 256 hidden units and uses the ReLU activation function."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Net(\n",
      "  (linear_1): Linear(in_features=784, out_features=256, bias=True)\n",
      "  (linear_2): Linear(in_features=256, out_features=10, bias=True)\n",
      "  (relu): ReLU()\n",
      ")\n"
     ]
    }
   ],
   "source": [
    "class Net(nn.Module):\n",
    "    def __init__(self, num_inputs = 784, num_outputs = 10, num_hiddens = 256, is_training = True):\n",
    "        super(Net, self).__init__()\n",
    "        \n",
    "        self.num_inputs = num_inputs\n",
    "        self.num_outputs = num_outputs\n",
    "        self.num_hiddens = num_hiddens\n",
    "        \n",
    "        self.linear_1 = nn.Linear(num_inputs, num_hiddens)\n",
    "        self.linear_2 = nn.Linear(num_hiddens, num_outputs)\n",
    "        \n",
    "        self.relu = nn.ReLU()\n",
    "    def forward(self, X):\n",
    "        X = X.reshape((-1, self.num_inputs))\n",
    "        H1 = self.relu(self.linear_1(X))\n",
    "        out = self.linear_2(H1)\n",
    "        return out   \n",
    "    \n",
    "net = Net()\n",
    "print(net) "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Training the model follows the exact same steps as in our softmax regression implementation."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "epoch 1, loss 0.0030, train acc 0.718, test acc 0.673\n",
      "epoch 2, loss 0.0019, train acc 0.818, test acc 0.834\n",
      "epoch 3, loss 0.0016, train acc 0.844, test acc 0.832\n",
      "epoch 4, loss 0.0015, train acc 0.857, test acc 0.838\n",
      "epoch 5, loss 0.0014, train acc 0.865, test acc 0.786\n",
      "epoch 6, loss 0.0014, train acc 0.870, test acc 0.847\n",
      "epoch 7, loss 0.0013, train acc 0.878, test acc 0.856\n",
      "epoch 8, loss 0.0013, train acc 0.882, test acc 0.859\n",
      "epoch 9, loss 0.0012, train acc 0.884, test acc 0.834\n",
      "epoch 10, loss 0.0012, train acc 0.889, test acc 0.860\n"
     ]
    }
   ],
   "source": [
    "num_epochs, lr, batch_size = 10, 0.5, 256\n",
    "train_iter, test_iter = load_data_fashion_mnist(batch_size)\n",
    "criterion = nn.CrossEntropyLoss()\n",
    "train_ch3(net, train_iter, test_iter, criterion, num_epochs, batch_size, lr)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Exercises"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "1. Try adding a few more hidden layers to see how the result changes.\n",
    "2. Try out different activation functions. Which ones work best?\n",
    "3. Try out different initializations of the weights."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# References"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "[1] Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., & Salakhutdinov, R. (2014). JMLR"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.8"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
